Association mapping with a diverse population of Puccinia graminis f. sp. tritici identified avirulence loci interacting with the barley Rpg1 stem rust resistance gene

Background Wheat stem rust, caused by Puccinia graminis f. sp. tritici (Pgt), is an important disease of barley and wheat. A diverse sexual Pgt population from the Pacific Northwest (PNW) region of the US contains a high proportion of individuals with virulence on the barley stem rust resistance (R) gene, Rpg1. However, the evolutionary mechanisms of this virulence on Rpg1 are mysterious considering that Rpg1 had not been deployed in the region and the gene had remained remarkably durable in the Midwestern US and prairie provinces of Canada. Methods and results To identify AvrRpg1 effectors, genome wide association studies (GWAS) were performed using 113 Pgt isolates collected from the PNW (n = 89 isolates) and Midwest (n = 24 isolates) regions of the US. Disease phenotype data were generated on two barley lines Morex and the Golden Promise transgenic (H228.2c) that carry the Rpg1 gene. Genotype data was generated by whole genome sequencing (WGS) of 96 isolates (PNW = 89 isolates and Midwest = 7 isolates) and RNA sequencing (RNAseq) data from 17 Midwestern isolates. Utilizing ~1.2 million SNPs generated from WGS and phenotype data (n = 96 isolates) on the transgenic line H228.2c, 53 marker trait associations (MTAs) were identified. Utilizing ~140 K common SNPs generated from combined analysis of WGS and RNAseq data, two significant MTAs were identified using the cv Morex phenotyping data. The 55 MTAs defined two distinct avirulence loci, on supercontig 2.30 and supercontig 2.11 of the Pgt reference genome of Pgt isolate CRL 75-36-700-3. The major avirulence locus designated AvrRpg1A was identified with the GWAS using both barley lines and was delimited to a 35 kb interval on supercontig 2.30 containing four candidate genes (PGTG_10878, PGTG_10884, PGTG_10885, and PGTG_10886). The minor avirulence locus designated AvrRpg1B identified with cv Morex contained a single candidate gene (PGTG_05433). AvrRpg1A haplotype analysis provided strong evidence that a dominant avirulence gene underlies the locus. Conclusions The association analysis identified strong candidate AvrRpg1 genes. Further analysis to validate the AvrRpg1 genes will fill knowledge gaps in our understanding of rust effector biology and the evolution and mechanism/s of Pgt virulence on Rpg1. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10670-y.


Background
Puccinia graminis f. sp.tritici (Pgt) is the causal agent of the disease wheat stem rust on barley (Hordeum vulgare L.) and wheat (Triticum aestivum L.) [1,2].Historically, Pgt caused economically devastating epidemics on both cereal crops around the globe [1,3,4].During the second half of 20 th century, Pgt was managed through the deployment of effective genetic resistance in barley and wheat [4,5].The removal of the sexual alternate host, barberry (Berberis vulgaris), in the Midwestern United States (US) and Western Europe nearly eliminated the Pgt sexual cycle effectively stabilizing Pgt populations in these regions by limiting the development of new virulence gene recombination [1,6].However, the evolution of new virulent pathotypes like TTKSK (a.k.a Ug99) in Africa and the speed of dispersal across the continent reiterated the need for vigilant monitoring of Pgt sexual populations that can rapidly evolve virulence patterns that threaten global barley and wheat production [2,7].Recent stem rust outbreaks in Western Europe and their association with the sexual host, barberry, is alarming since genetic recombination during the sexual cycle can generate genotypes with novel virulence gene combinations [8][9][10].
The Pacific Northwest (PNW: US states Washington, Idaho, Oregon, and Canadian province British Columbia) region of North America serves as a center of Pgt diversity on the continent [11][12][13][14][15].In the PNW, Pgt cycles between cereals, native grasses, Mahonia, and barberry shrubs to complete its sexual spore forming stage [12][13][14][15].Recently, the native shrubs endemic to the woodland areas of the PNW, Mahonia repens and Mahonia auilifolium, were shown to serve as sexual hosts of Pgt in addition to B. vulgaris that survived eradication in the PNW [13].Virulence characterization of Pgt isolates collected from Eastern Washington on the two major barley R-genes, Rpg1 and the rpg4/5-mediated resistance locus (RMRL) [15] identified isolates that are virulent on barley line Q21861, which carries both R-genes.This was the first report of Pgt races or isolates that were virulent on both major barley stem rust R-genes when stacked together representing the most virulent Pgt population on barley from across the globe.
We hypothesized that genetic reshuffling during sexual reproduction of Pgt allowed both virulence genes to recombine in the PNW Pgt genotypes as isolates virulent on each R-gene are also present in the population at higher frequency than the isolates virulent on both R-genes.These virulent genotypes not only pose a threat to the Washington barley industry but could be wind disseminated over the Rocky Mountain and threaten production in other major barley producing regions of North America.Indeed, the Pgt race QCCJB, the first North American isolate with virulence on barley lines containing Rpg1, that caused epidemics on barley and wheat during 1990s in the Midwestern US was hypothesized to have originated from the PNW region [12].
Rpg1, RMRL, and the recently identified Rpg7 are the three major wheat stem rust resistance genes in barley that confer all stage resistance to many North American pathotypes [5,[16][17][18].Of these R-genes, Rpg1 is the only one deployed in commercial barley cultivars grown in the Northern Great Plains and Canadian Prairie provinces [5,16].Since the 1950s Rpg1 provided remarkably durable resistance for nearly 50 years, until Pgt race QCCJB emerged in the Northern Great Plains with virulence on barley lines containing Rpg1 [5,12].Rpg1 was mapped to the telomeric region of chromosome 7H [19] and cloned via a map-based strategy [16].Rpg1 encodes a dual kinase domain protein with a pseudokinase domain (pK1) and an active kinase domain (pK2), that are both required for resistance [16].Previously, two effectors, RGD and VPS9 corresponding to the barley Rpg1 gene were identified from Pgt race MCCF [20,21].Inoculation with Pgt race MCCF containing these effectors resulted in phosphorylation of RPG1 as early as 5 min post inoculation, before spore germination (30 min), suggesting that these effectors are very early elicitors of the Rpg1-mediated resistance response.
Stem rust virulence/avirulence effector identification has been slow partly due to the inability to culture the obligate biotrophic pathogen outside its hosts [22].Till now, seven Pgt effector genes have been identified including AvrSr35 [23], AvrSr50 [24], AvrSr27 [25], AvrSr13c [26], AvrSr22 [26], RGD [20,21], and VPS9 [20,21].Several approaches including transcriptomics, bi-parental mapping, and comparative genomics have been used to identify candidate effector loci or genes in fungal pathogens [27][28][29][30].However, advances in high-throughput DNA sequencing have enabled whole genome sequencing of fungal pathogens at the population level, producing high density SNP markers which allow for high-resolution mapping via genome wide association studies (GWAS).This advancement in the high-resolution genetic characterization of fungal plant pathogen populations for host-pathogen genetic interactions allows for the rapid identification of candidate virulence/avirulence effector genes [31][32][33][34][35]. GWAS has been extensively used in plants and animals [36] but recently became a powerful tool to genetically map virulence/avirulence loci in plant fungal pathogens [31-35, 37, 38].For example, [32] conducted a GWAS study on a natural population of Parastagonospora nodorum, a necrotrophic fungal pathogen of wheat, and identified novel virulence loci along with the previously described effector genes, SnToxA and SnTox3.Similarly, in another haploid fungal pathogen of wheat, Zymoseptoria tritici, the avirulence gene Zt_8_609 was discovered through genome wide association analyses [31].Although the GWAS approach has increasingly been utilized in haploid plant fungal pathogen populations the approach has seen little utilization in the dikaryotic cereal rusts.Recently, a similar approach, transcriptome wide association study (TWAS), was performed using 24 Pgt isolates to identify 33 variants within 28 genes that were associated with virulence on the barley rpg4/Rpg5-mediated resistance locus (RMRL) [30].In a separate GWAS using 96 Pgt isolates, 17 Pgt loci were found to be associated with six stem rust resistance genes in wheat [34,35].Once identified, avirulence or virulence effectors can be utilized to aid in the deployment of broad-spectrum resistances in crops by accelerating R-gene identification, identifying specific interactions with host virulence targets, and providing an effective means to screen breeding materials if an effective delivery method is available by delimiting single gene-for-gene interactions [39].Effector characterization can also help elucidate evolutionary mechanism of virulence acquisition in pathogens [23][24][25]40].For example, allele analysis of AvrSr35 among virulent and avirulent Pgt isolates identified that the insertion of a mobile element into AvrSr35 resulted in virulence on the wheat Sr35 gene [23].
The Pgt population characterized from eastern Washington in the PNW is dominated by genotypes that are virulent on barley lines containing Rpg1 (99%), contain a high proportion of individuals virulent on RMRL (16%), and 10% of the population is virulent on barley containing both Rpg1 and RMRL [15].This raises the question of how this high level of virulence on barley became predominant in the population considering that neither the Rpg1 nor RMRL resistances were deployed in the region.Thus, the overarching goal of this study was to identify candidate Rpg1 virulence/avirulence effector genes in order to begin filling knowledge gaps in stem rust effector biology and the evolutionary processes that led to this high level of virulence on barley.Utilizing whole genome sequencing, RNAseq data and infection type data for 113 Pgt isolates, 55 significant MTAs were identified corresponding to two unique loci in the Pgt genome that putatively contain AvrRpg1 effectors.Thus, Pgt loci that evolved to overcome Rpg1 resistance were characterized using genome wide association studies (GWAS) and candidate AvrRpg1 genes underlying these loci identified.

Barley genotypes
Five barley genotypes including the transgenic line H228.2c and cultivars (cvs) Morex, Golden Promise, Harrington, and Steptoe were used in this study (Table 1).The Golden Promise transgenic line (H228.2c)and cv Morex (CIho 15773) represent genotypes containing Rpg1 in distinct genetic backgrounds.Morex is a natural source of the Rpg1 gene from which the gene was originally cloned [16].The Golden Promise transgenic line (H228.2c)carries a single copy of the Rpg1 gene from cv Morex in the susceptible cv Golden Promise (GP) background [41] (Table 1).Wildtype cv Golden Promise (PI 343079) is a two-rowed malting barley developed by the Miln Marsters seed company Cheshire, UK using mutational breeding.Morex, is a six-rowed malting variety released by the University of Minnesota [42].Steptoe (CIho 15229) is a six-rowed feed barley released by Washington State University [43].Harrington is a two-rowed spring malting barley developed at the University of Saskatchewan [44].Barley cultivars Steptoe, Harrington, and Golden Promise were used as susceptible checks because they are considered universal stem rust susceptible checks that do not carry any known stem rust R-genes [15] (Table 1).Details on the source, improvement status, and R-gene of the plant materials are provided in Table 1.

Puccinia graminis f. sp. tritici isolates
A total of 113 Pgt isolates were utilized in this study, including 89 from the Pacific Northwest region (PNW) (Washington and Idaho) and 24 from the Midwest (North Dakota) region of the U.S (Supplementary Table S1).The PNW isolates were derived from stem rust samples collected from diverse hosts including barley, wheat, Mahonia, and barberry during the 2019-2020 growing seasons (Supplementary Table S1).The Midwestern isolates were collected as part of cereal rust surveys conducted in North Dakota from 1977 to 1999.Details on isolates are presented in Supplementary Table S1.

Phenotype assays
Disease phenotype data for 96 of the Pgt isolates was generated based on their infection types on the barley lines; WT cv Golden Promise (Rpg1 -), the cv Golden Promise transgenic line H228.2c(Rpg1 +), cv Morex (Rpg1 +), cv Steptoe (Rpg1 -), and cv Harrington (Rpg1 -).We also utilized previously generated phenotype data on cv Morex for 17 additional Midwest isolates.Thus, disease phenotypes on cv Morex (Rpg1 +) was obtained for 113 Pgt isolates.Two Pgt races QCCJB and HKHJ with known disease reactions on the aforementioned barley lines were used as controls.Pgt race QCCJB is virulent and Pgt race HKHJ is avirulent on barley lines containing Rpg1 [15].Two seeds of each barley line were grown in containers filled with potting soil mix (Sungro Horticulture, MA, USA).Host genotypes were replicated three times in a completely randomized design.Each phenotype assay was conducted twice.The seedlings were grown in environmentally controlled growth chambers set at 19 ± 1ºC with a 16-h (400 µm/m 2 ) light and 8 h dark cycle.Inoculations were performed when primary leaves were fully expanded (~ 9 days after planting).Seedlings were inoculated with a suspension of light mineral oil and urediniospores (0.05 mg urediniospore/plant) using atomizers pressured by a pump set at 30 kPa [15].After inoculation, plants were kept in mist chambers set at 19ºC, and 100% RH with complete darkness for 18 h to facilitate the infection process.Seedlings were returned to the growth chamber and disease ratings taken at 14 days after inoculations (DAI).
Stem rust infection types (ITs) were assessed for primary leaves at 14 DAI using the "modified 0-4 scale", originally developed for wheat by [45].The modification of this 0-4 scale for barley were described by [46] and were based on the uredinial sizes on barley leaves.The description of each IT on the 0-4 scale for barley is explained by [47].Barley shows a mesothetic reaction to Pgt where multiple ITs are observed on a single leaf.In this case, ITs were recorded in their order of prevalence.Then, the categorical phenotype scores of 0 to 4 were converted into numeric quantitative scores of 0 to 5 for the ease of virulence interpretation and association analysis [48].A numeric quantitative disease score of < 3 was considered avirulent, and > 3 as virulent.

Puccinia graminis f. sp. tritici genotyping Pathogen DNA extraction
Genomic DNA (gDNA) of each isolate (n = 96) was extracted from urediniospores (~ 30 mg/isolate) using the Quick-DNA Fungal/Bacterial Miniprep kit (Zymo Research) following the manufacturer's recommended protocol.The integrity of gDNA was assessed on 1.2% agarose gel stained with gel red.gDNA with a high molecular weight band (~ 10-15 kb) without smearing was considered as high-quality gDNA.gDNA for each isolate was quantified on the Qubit 4.0 fluorometer using the Broad Range assay kit (ThermoFisher Scientific).The purity of gDNA samples were assessed using a NanoDrop 1000 Spectrophotometer (Ther-moFisher Scientific).gDNA samples with 260/280 ratio of 1.8 ± 0.1 and 260/230 ratio of 2 ± 0.1 or above were considered pure DNA samples and utilized for whole genome sequencing library preparation.

WGS library preparation and sequencing
Whole genome shotgun sequencing libraries were prepared for each of 96 isolates using the FS DNA library prep kit (NEB, New England Biolabs) following the manufacturer's recommendations (Table 2).Briefly, gDNA (500 ng/sample) was randomly sheared, adaptor-ligated, size-selected, barcoded, PCR enriched, and bead cleaned to generate libraries with 250-300 bp insert sizes.Each isolate library was barcoded with unique dual indexes to facilitate multiplexing of samples for sequencing.Library fragment sizes were determined using the DNA 1000 assay on an Agilent 2100 bioanalyzer system (Agilent technologies).Libraries with a single main peak around 370-420 bp and without primer (80 bp) and adapter-dimer (128 bp) peaks were considered quality libraries for sequencing.Libraries (n = 96) were normalized to a 5 nM concentration which was based on the average fragment size (bp) as determined by bioanalyzer data and DNA concentrations measured by the Qubit broad assay.Normalized libraries were pooled together and an aliquot of 50 µl at 5 nM concentration was sent to Novogene corporation (Sacramento, CA, USA) for sequencing.At Novogene, the concentration of effective library was determined by qPCR assays before sequencing.A Table 2 Summary of sequencing strategy for Puccinia graminis f. sp.tritici isolates used in this study a a A total of 96 isolates were genotyped by whole-genome shotgun sequencing (WGSS) approach.Seventeen isolates from ND, Midwest were genotyped by RNA sequencing approach [30].WA, ID, and ND refer to the states of Washington, Idaho, and North Dakota, respectively, from where isolates were collected

Number of isolates
Genotyping Location

(sub-total)
150 bp paired-end sequencing run was performed utilizing a single lane on a Novoseq 6000 sequencer (Illumina platform).

Quality control, mapping and variant calling for WGS data
The quality of raw sequencing reads was examined using FastQC (v0.11.9) and low-quality reads were filtered out using Fastp software (v0.22.0)[49].The Pgt isolate CRL 75-36-700-3 reference genome was indexed using Burrows-Wheeler Alignment (BWA) tool (v0.7.17).The Pgt isolate CRL 75-36-700-3 genome assembly represents the collapsed haploid assembly of two karyons of dikaryotic urediniospores with 392 supercontigs and a total assembly size of 88.72 Mbp [50].Quality reads were mapped to the indexed reference genome using the BWA-mem algorithm with default settings [51].Alignment files in BAM format were coordinate sorted, indexed, and then duplicates were marked and removed using the Picard tools (v2.25.4).Variants were detected using the GATK (v4.2.5.0) tool.Briefly, per sample variant calling was done using GATK HaplotypeCaller then calls for all samples were merged using CombineGVCFs, and finally joint variants were called with the GenotypeGVCFs function of GATK.The raw variants were split into SNPs and INDELs using the SelectVariants function of GATK.

RNAseq library preparation and sequencing
RNA extraction, cDNA synthesis, library preparation and sequencing for 17 Midwest isolates was previously reported [30] (Table 2).In this study we utilized previously generated transcript sequence data to identify variants for 17 Midwest Pgt isolates.

Quality control, mapping and variant calling for RNAseq data
Quality control of raw sequencing reads was done as described for the WGS data.Quality reads were mapped to the CRL 75-36-700-3 reference genome using STAR mapper (v2.7.10a), a splice aware alignment tool [52].Briefly, a genome index was build utilizing both reference assembly and gene annotation.Then, trimmed reads were mapped to the reference genome with option "--twopass-Mode Basic", to improve alignment to the splice junctions.Alignment files in BAM format were coordinate sorted and indexed as described for the WGS data.Variant calling was done using GATK (v4.2.5.0).Before variant calling, RNAseq alignment was reformatted using the SplitNCigarReads function of GATK.This generated an alignment where reads containing Ns at the splicing events were split and mapping qualities reassigned to match DNA convention for GATK, HaplotypeCaller.Per sample variants were called using Haplotypecaller.Then, the sample variants of the 17 Midwestern isolates were merged with variants of 96 whole genome sequenced PNW isolates.Finally, joint variant calling was done for all 113 isolates using the GenotypeGVCFs function of GATK.Then, raw variants were processed further following the same procedure and parameters described for WGS.SNP variants obtained by combining RNAseq and WGS data were used for identification of markertrait associations (MTAs) with IT data generated for cv Morex.

Variants effect prediction
The SNP and Indel effects were determined and annotated using SnpEff (v5.1) [53].The SnpEff tool categorizes variant effects by impact into four classes: High, Moderate, Low, and Modifier.The High and Moderate impact variants were considered detrimental, and their densities were determined on each supercontig (n = 392) of the CRL 75-36-700-3 reference genome.A detailed explanation on different types of variants under these four classes can be found at http:// snpeff.sourc eforge.net/ VCFan notat ionfo rmat_ v1.0.pdf.

Principal component analysis (PCA)
PCA was performed on a reduced set of SNPs because of the computational load and run times with large data sets.Ten percent of the SNPs were randomly selected using the SelectVariants function of GATK.PCA was performed using the package SNPRelate [55] to describe the population structure of Pgt isolates (n = 96) for which genome wide SNP variants were available.The percentage of variance explained by each principal component was determined using the eigen values from PCA result.
Finally, new projected dimensions were visualized using the ggplot2 package in R.

Pgt isolates relatedness
Relatedness among the Pgt isolates was assessed utilizing 10% randomly distributed SNPs because of computational difficulty to handle a large distance matrix.Genetic distance between Pgt isolates (n = 96) was calculated using the 'gdist' function of the NAM package [56] in R. Based on the genetic distances of Pgt isolates clustering was performed using the hclust function and ward method in R. A dendrogram was constructed using "dendextend" package [57] with minimum number of clusters (k = 3) based on the lower Bayesian Information Criterion (BIC) value to evaluate and visualize relatedness among isolates.Finally, a dendrogram was generated in circular form using the circlize package [58] in R.

Linkage disequilibrium (LD) calculation
LD was computed as squared allele frequency correlations (R 2 ) between intrasupercontig marker pairs using a sliding window size of 50 markers surrounding the current site in the Tassel software (v5.0) [59].LD was performed on a reduced marker set (10%) for computational efficiency.To understand the pattern of LD decay, a non-linear model, y = log(x) was fitted, where x denotes distance between marker pairs (in kb) and y denotes R 2 value between marker pairs [34,35].

Genome wide association studies (GWAS)
Association analysis between genotype and phenotype was performed using genome association and prediction integrated tool (GAPIT) (v3.3) [60].Quantitative disease scores were utilized as phenotype data and SNPs as genotype data.The mixed linear model (MLM), which can incorporate principal components (PCs), (Q) as fixed effect covariate and kinship/relatedness (K) as random effect covariate was run for association analysis [61].The number of PCs that explained at least 25% variation were included in the model.Similarly, GAPIT by default incorporated the kinship data generated with the VanRaden function within the MLM model.If no significant association was detected with the MLM model, an additional model, BLINK was run.For the BLINK model [62], only the PCs could be incorporated as covariates.The Bonferroni correction was applied to p-value (0.05) to prevent false association due to multiple testing of markers.The corrected p-value was calculated by dividing the generic p-value (0.05) by the total number of tested markers.The markers were considered significantly associated with phenotype only when the p-value associated with markers was lower than the corrected p-value.

Candidate effector identification
The logarithm of the odd (LOD) scores were calculated [LOD = -log10(p-value)] for all significant MTA SNP markers present on the Pgt isolate CRL 75-36-700-3 reference genome supercontigs.The LOD scores were plotted along the physical position of markers to identify regions that harbor significant MTA (Supplementary Fig. S1).SNP markers flanking the significant MTA that fell below the significant threshold level were identified as flanking markers to delimit the physical regions containing candidate AvrRpg1 genes (Supplementary Fig. S1).
The predicted gene models within the delimited regions were identified based on the Pgt isolate CRL 75-36-700-3 reference genome [50] annotations and designated as candidate AvrRpg1 genes.[40] for which a haplotype phased chromosome level assembly is available.Briefly, the Pgt21-0 genome assembly was obtained from the JGI MycoCosm web portal (https:// mycoc osm.jgi.doe.gov/ mycoc osm/ home/ relea ses? flt= pucci nia+ grami nis) and a local nucleotide blast database was created using the makeblast function of the BLAST (v2.5.0 +) toolkit.Candidate gene sequences were blasted against the local Pgt21-0 nucleotide database using the blastn function of BLAST to determine presence/absence and chromosomal location of genes.Protein homologs of candidate genes were searched using the BLASTP program in the NCBI protein database at a threshold level of 90% identify with 90% query coverage.We also blast searched the candidate gene sequences against the reference assembly to find if these genes evolved through duplication events.Significant MTAs were aligned and manually assessed to determine the haplotype state, homozygous vs heterozygous reference or alternate allele, for each locus among virulent and avirulent isolates to predict the dominant nature of avirulence or virulence.For the most significant SNP on each gene, the correlation between SNP haplotypes and phenotypes was computed using the point bi-serial method in R. A Kruskal-Wallis test was performed to determine if the haplotypes of the most significant SNP on each gene differed statistically for disease phenotypes.

Pathogen phenotype
Each isolate was phenotyped for disease infection types (ITs) on the five barley genotypes; WT cv Golden Promise (Rpg1 -), the cv Golden Promise transgenic line H228.2c(Rpg1 +), cv Morex (Rpg1 +), cv Steptoe (Rpg1 -), and cv Harrington (Rpg1 -).For the ease of virulence interpretation, ITs were converted to numeric disease scores of 0-5.Fifty-seven percent of the isolates (n = 96) were virulent on the cv Golden Promise transgenic line H228.2c(Rpg1 +) (Fig. 1, Supplementary Table S2).Disease scores of the 96 Pgt isolates on H228.2c ranged from 0.17 to 3.69, with an average of 2.14.(Fig. 1).Eightythree percent of the isolates (n = 113) showed virulence on cv Morex (Rpg1 +), carrying the natural source of the Rpg1 gene (Fig. 1, Supplementary Table S2).Disease scores of the 113 Pgt isolates on cv Morex ranged from 0.75 to 4.07, with a mean of 3.14 (Fig. 1).H228.2c exhibited a strong immune reaction (ITs-0, 0; or 0;1) against all avirulent isolates while cv Morex did not show these high levels of resistance in response to the avirulent isolates (Fig. 1).Two control Pgt races QCCJB and HKHJ exhibited disease phenotype as expected where Pgt race QCCJB was virulent on H228.2c and cv Morex and Pgt race HKHJ was avirulent on both H228.2c and cv Morex.The susceptible checks, cvs Steptoe, Harrington, and Golden Promise, displayed susceptible reactions to Pgt races QCCJB and HKHJ as expected.However, the average disease score of the Pgt population was comparatively higher on Steptoe than Harrington and Golden Promise (Fig. 1).

Sequencing and mapping statistics
Whole genome sequencing of the 96 Pgt isolates primarily collected from the PNW yielded a total of 6.rate ranged from 82 to 94% with an average of 93% (Fig. 2B).The estimated genome coverage when computed with mapped reads, ranged from 50 × to 106 × with an average of 79 × per sample (Fig. 2A).We extracted the unmapped reads from samples with < 90% mapping rate (n = 4).These reads predominantly represented thrip (Frankliniella occidentalis), barley (Hordeum vulgare Subsp.vulgare), and bacterial (Pantoea spp.) genomic DNA sequences.These contaminants were probably introduced during rust isolate increase and subsequent handling in the greenhouse.
For the 17 Midwest isolates, an RNAseq approach was used generating a total of 0.7 billion single end (SE) reads on an Illumina NextSeq 500 sequencer.The number of raw reads ranged from 33 to 82 million per sample with an average of 44 million.Seedling primary leaf tissues infected with each of the 17 Midwest Pgt isolates were utilized for RNA extraction, hence the raw data included both Pgt and barley RNAseq reads.The percentage of quality trimmed reads that mapped to the CRL 75-36-700-3 reference genome ranged from 5 to 69% per sample with an average of 43%.

Variant statistics
Mapping of WGS reads to the CRL-75-36-700-3 reference genome and subsequent variant calling identified 1,195,947 SNPs and 168,516 Indels among the 96 Pgt isolates.The densities of variants were computed on each supercontig (n = 392) to determine genome-wide variant distribution (Fig. 3).SNP densities varied from 0 to 38 Fig. 2 Read coverage and mapping summary for Pgt isolates (n = 96) sequenced using the whole genome sequencing approach.A Red line with dots indicates the genome coverage computed for each Pgt isolate using raw sequencing reads.Blue line with dots shows the genome coverage computed with mapped reads after quality control.B Blue and yellow dots indicate the percentage of mapped and unmapped reads, respectively to the reference genome, CRL 75-36-700-3.Each dot represents a Pgt isolate SNPs/Kb across the supercontigs (Fig. 3).Smaller supercontigs (< 250 Kb) had uneven SNP and Indel densities compared to longer supercontigs (Fig. 3).The densities of Indels ranged from 0 to 8 Indels/Kb across the supercontigs (Fig. 3).The size of Indels extended from 1 to 60 bp in length with the most frequent being 1 bp (33%) followed by 2 bp (20%), and 3 bp (14%).Smaller Indels (< 10 bp) accounted for 91% of total identified Indels.
Joint variant calling using the WGS for the 96 Pgt isolates and RNAseq data for the 17 Midwestern isolates yielded a total of 136,391 SNPs and 8,238 Indels among the total of 113 Pgt isolates.Lower number of variants were detected in this combined analysis compared to only WGS data because the RNAseq alignments of the additional 17 isolates limited the variant detection to genic regions.Also, we did not permit any missing data, so variants identified outside genic regions based on WGS data were filtered out.

Variants effect predictions
The effects of variants were predicted using SnpEff.This tool outputs all possible consequences that can be caused by a variant in a gene's transcript isoforms or in multiple genes sharing the same promoter sequence.Thus, the total number of effects was greater than the total number of variants.The 1,195,947 SNPs and 168,516 Indels caused 4,012,622 and 643,357 effects, respectively.Variant effects were further categorized as high (0.07%), moderate (2.91%), low (4.60%), and modifier (92.4%) based on the impact they were predicted to cause in the genic regions.The densities of high and moderate impact SNPs were computed and visualized across each supercontig (Fig. 3).The ratio of missense to silent mutations was 0.71 for SNP variants.A total of 1,911 nonsense effects (stop gained) were caused by SNPs across the Pgt genome.The effects of SNPs were mostly in downstream (34%) and upstream regions (34%) followed by intergenic (18%), exon (7%), intron (3%), and others (4%).High, moderate, low, and modifier Indel effects accounted for 1.36%, 0.86%, 0.47%, and 97.29%, respectively, of the total predicted effects.Densities of high and moderate impact indels were calculated and visualized across each supercontig (Fig. 3).Indel effects were primarily in upstream (36%) and downstream (35%) regions followed by intergenic (19%), intron (4%), exon (2%) and others (4%).Indels caused a total of 8,250 frameshift mutation within predicted genes across the genome.

Principal component analysis
Principal component analysis was done to infer the structure of the Pgt population (n = 96).The first three principal components (PCs) explained 26.5% of the variance in the Pgt population structure (Supplementary Fig. S2).The percentage of variance explained by PC1, PC2, and PC3 was 15.60%, 6.35%, and 4.55%, respectively.PC1 vs PC2 and PC2 vs PC3 were visualized with biplots.(Supplementary Fig. S2).

Relatedness
Genetic relatedness among the Pgt isolates (n = 96) was evaluated based on genetic distance using SNP genotype data and visualized with a dendrogram (Fig. 5).Three Fig. 5 Dendrogram showing genetic relatedness among the Pgt isolates (n = 96) used in this study.Three major clusters were observed and are indicated by blue, green, and pink colored branches.The first, second, and third cluster includes 63, 26, and 7 Pgt isolates, respectively.Red and black labels in the outermost layer represent isolates collected from alternate and cereal hosts, respectively.Isolates avirulent to Rpg1_Golden Promise transgenic line H228.2c(GPT) are indicated with a star major clusters were identified within the Pgt population based on the lowest BIC value (Fig. 5).The first cluster comprised of 63 Pgt isolates including 42 derived from alternate hosts and 21 from cereal hosts (Fig. 5).Twentysix isolates were grouped into the second cluster comprised of 8 isolates collected from alternate hosts and 18 from cereal hosts (Fig. 5).The third cluster consisted of 7 isolates obtained from cereal hosts only (Fig. 5).The first two clusters were comprised of both virulent and avirulent Pgt isolates to the Rpg1_trangenic line, H228.2c (Rpg1 +) (Fig. 5).However, the third cluster included only avirulent isolates to the Rpg1_transgenic line H228.2c(Fig. 5).Six Midwest isolates (R29J, A-14, 370-c, A-15, TMNK, and A-21) formed a separate sub-group within the first cluster (Fig. 5).Interestingly, one Midwest isolate (QCCJB) collected from cereal host was more closely related to sexual isolates from the PNW (Fig. 5).

Linkage disequilibrium (LD)
LD values (R 2 ) were computed from 1.5 million marker comparisons using 119,595 SNP sites (10% of total SNP markers) generated from the GWS of the 96 Pgt isolates.A scatter plot was generated by plotting LD (R 2 ) values against physical distance (Kb) between marker pairs [34,35].The non-linear model [y = log(x)] described the genome-wide LD (R2) pattern as shown by the red line in figure 4.7.Based on the model, LD decayed to the R 2 value of 0.3, at physical distance of ~ 10 kb [34,35].The extent of LD was detected up to 200 kb [34,35].
The MTA on supercontig2.11was within the predicted gene PGTG_05433, and based on flanking non-significant SNPs the locus containing this MTA was delimited to 78 bp.This locus was designated AvrRpg1B.

Candidate loci and gene characterization
The genome architecture of the 35 kb AvrRpg1A locus is shown in Fig. 7. Gene density in the genomic region was comparable (4 per 35 kb) to gene density (7 per 35 kb) across the supercontig2.30.The region harbored 102 repeats ranging from 8 to 644 bp, with repeats < 20 bp being the most common.(Fig. 7).SNPs and Indel densities ranged from 0 to 17 and 0 to 4 per 200 bp, respectively (Fig. 7).Candidate effector genes, PGTG_10878, PGTG_10884, PGTG_10885, and PGTG_10886 encode predicted proteins of 233, 386, 460, and 725 amino acids, respectively (Table 5).PGTG_10878 and PGTG_10884 did not share protein homology with other fungal species and were specific to Pgt (Table 5).Protein homologs of PGTG_10885 were found in seven rust species including Puccinia striiformis f. sp.tritici, Puccinia triticina, Austropuccinia psidii, Melampsora americana, Melampsora larici-populina, Cronartium quercuum f. sp.fusiforme, and Puccinia sorghi (Table 5).PGTG_10886 was homologous to proteins of P. triticina and P. striiformis f. sp.tritici (Table 5).A BLAST search of candidate genes against the same reference genome (CRL 75-36-700-3) indicated no duplication events.The CRL 75-36-700-3 reference genome assembly limited the identification of gene space to the supercontig level without anchoring at the chromosome level.Hence, to find the chromosomal location of candidate genes we used the Australian isolate Pgt21-0 with complete chromosomes assembled.BLAST searches of the four AvrRpg1A candidate gene sequences against the Pgt21-0 assembly indicated that these genes are located at the telomeric region of chromosome 2 (haplotype B) from 676,683 bp to 700,790 bp.Two loci were identified with the cv Morex phenotyping data, one colocalizing with the 35 kb AvrRpg1A locus on supercontig2.30and the second delimiting the AvrRpg1B locus to a 78 bp region on supercontig2.11.The AvrRpg1B locus on supercontig2.11harbored a single gene, PGTG_05433, which was found to be unique to Pgt as determined by protein homology searches in the NCBI protein database (Table 5).A local BLAST search of PGTG_05433 sequence against Pgt21-0 genome assembly revealed three hits on chromosomes 11, 4, and 3 (haplotype B), so a precise chromosomal position could not be determined.

Significant markers on candidate genes and regulatory regions
Among the 53 MTAs identified with Rpg1_transgenic line H228.2c,26 MTAs were within genic or regulatory regions (500 bp upstream/downstream) of four candidate genes and the remaining 27 markers were located in intergenic regions.Candidate genes PGTG_10878, PGTG_10884, PGTG_10885, and PGTG_10886 harbored 1, 0, 1 and 11 SNP variants in genic and 1, 4, 1, and 7 variants within putative regulatory regions, respectively (Fig. 8).Among the 13 SNPs in genic regions only two were located in exons, one in the first exon of PGTG_10878 and another in exon thirteen of PGTG_10886 (Fig. 8).The exonic SNP in PGTG_10878 resulted in a predicted nonsynonymous mutation where asparagine was substituted by aspartic acid.The other exonic SNP in PGTG_10886 was a predicted synonymous mutation (tyrosine to tyrosine).
Among the two MTAs identified with Rpg1_Morex, one was within the regulatory region of gene model PGTG_10886 and another in the exonic region of PGTG_05433 (Fig. 8).The SNP on PGTG_05433 caused a predicted nonsynonymous mutation where arginine was replaced by lysine.

Allele analysis
The genotypes of significant SNPs identified with Rpg1_ transgenic line H228.2cwere in homozygous or heterozygous state in the majority of avirulent isolates while the alternate homozygous state was identified for virulent isolates (Supplementary Table S3).This indicated that AvrRpg1A-mediated avirulence is dominant and virulence is recessive in the Pgt/Rpg1_H228.2cinteraction.Interestingly, the majority of avirulent isolates were in heterozygous state for the significant MTAs (Supplementary Table S3).All eighteen MTAs located within genic or regulatory regions of the gene PGTG_10886 had near perfect correlations with the phenotypes for the Pgt isolates, with a correlation coefficient (|r|) of 0.95 for the most significant SNP in the gene (Supplementary Table S3; Fig. 9).Similarly, all eight MTAs within genic or regulatory regions of the other three candidate genes PGTG_10878, PGTG_10884, and PGTG_10885 also had good correlations with observed phenotypes, with the correlation coefficients ranging from 0.72 to 0.95 for the most significant SNP in each gene (Supplementary Table S3; Fig. 9).For both the significant MTAs identified with Rpg1_Morex, the MTA in the regulatory region of the gene model PGTG_10886 and the MTA within the PGTG_05433, moderate correlations (|r|= 0.45 to 0.52) were observed with disease phenotypes (Supplementary Table S4, Fig. 9).The haplotype associated with PGTG_10886 was in heterozygous or homozygous state in 20 avirulent isolates and in alternate homozygous form in 53 virulent isolates (Supplementary Table S4; Fig. 9).These data suggest that avirulence is dominant and virulence is recessive in the Pgt/Rpg1_Morex interaction as was determined for the Pgt/Rpg1_H228.2cinteraction.

Discussion
The first North American isolate collected from barley, designated Pgt race QCCJB based on wheat stem rust differential lines, identified as virulent on barley lines containing Rpg1 was hypothesized to have originated from Table 5 Summary of candidate effector gene, encoded protein, and protein homology a a Candidate effector genes, PGTG_10878, PGTG_10884, PGTG_10885, and PGTG_10886 correspond to Rpg1_H228.2c.Similarly, candidate genes, PGTG_10886 and PGTG_05433 correspond to Rpg1_Morex.Gene and protein information is based on the annotation of reference genome of the Puccinia graminis f. sp.tritici isolate, CRL 75-36-700-3) [50].Protein homology search was done using blastp tool of NCBI with a threshold set at > 90% coverage and > 90% identity  the Pacific Northwest (PNW) population [12].Interestingly, our recent virulence profiling of Pgt isolates collected from barley in eastern Washington showed that Rpg1 virulence was predominate (99%) in this Pgt population [15].This suggests that evolutionary pressure on this population selected for this virulence, however this is perplexing considering that Rpg1 had not been deployed in PNW barley varieties.To identify Pgt virulence/avirulence loci that evolved to overcome Rpg1-mediated stem rust resistance in barley, we utilized whole genome shotgun sequencing, and phenotyping data on two barley lines containing Rpg1 for these diverse isolates collected from the PNW sexual population and isolates collected from the Midwestern US asexual population for genome wide association studies (GWAS).Two loci corresponding to Rpg1 avirulence were identified.The major locus, AvrRpg1A (accounted for ~ 23% estimated phenotypic variance) mapped to a 35 kb interval on supercontig2.30that contains four candidate effector genes.The minor locus, AvrRpg1B landed on a single gene within a 78 bp region on supercontig2.11.The detection of only two dominant avirulence loci that interact with Rpg1 indicated typical gene-for-gene interactions similar to those characterized for other rust pathosystems [63,64].
In planta transcriptome data has been commonly used to identify candidate effector genes in several fungal pathogens including rust [22].One prominent limitation of transcriptomics is difficulty to identify major loci contributing to disease phenotype since several to hundreds of genes are temporally expressed during the infection process [30,50].Bi-parental mapping is another powerful genetic tool that has been used to map virulence/avirulence loci in the rust pathogen, P. striiformis f. sp.tritici (Pst), a close relative of Puccinia graminis f. sp.tritici (Pgt) [65][66][67].Unlike Pst, creating a crossing population is challenging for Pgt because Pgt teliospores exhibit a very strong dormancy for an extended period [68].To overcome these limitations with transcriptomics and biparental mapping, we utilized genome wide association studies (GWAS) to map avirulence loci corresponding to the barley Rpg1 gene.The benefit of GWAS is that it can utilize population scale variability to identify significant marker-trait associations (MTAs) linked to trait of interest [69].The power of GWAS to detect robust associations depends upon several factors including population size, type, phenotype and genotype data, and statistical tools used for association analysis [70].
The Pgt isolates utilized in this study represent a suitable population size to map virulence/avirulence loci in the Pgt genome (~ 89 Mb).Several avirulence genes were mapped across the Pst genome utilizing much smaller populations of 14 and 30 Pst individuals [71,72].In a recent review, a population size of 50-100 diverse individuals was considered adequate to precisely identify MTAs in pathogens [73].The type of population, sexual or asexual, can significantly influence the resolution of genome-wide scans [73].The Pgt population in this study comprised of a majority of isolates (n = 89) collected from the states of Washington and Idaho.The PNW region of North America (US states Washington, Idaho, and Oregon and Canadian province British Columbia) is recognized to harbor sexual population of Pgt with a high level of virulence diversity [11,13].In a recent rust survey around the Washington-Idaho border, as many as 16 Pgt races were detected in a single field [74].Roelfs and Groth [11] identified > 100 races among 426 Pgt isolates assayed from the PNW region.Higher meiotic recombination rates in the genomes of sexual populations break linkage blocks which allow for higher resolution mapping of virulence/avirulence loci utilizing GWAS [73].The linkage decay (LD) in the Pgt population utilized in this study was approximately 10 kb across the genome, suggesting a sexual population with a high level of recombination [34,35].Since the collapsed haploid genome assembly was used for genomic analysis, including LD computation, the LD value reported here might have been underestimated to a certain degree; however, lower LD value would be expected in a population dominated by sexually originated isolates.
Robust disease phenotype data was generated for 96 and 113 Pgt isolates on two sources of the Rpg1 gene, Golden Promise transgenic line H228.2c and cv Morex, respectively, by performing disease assays under temperature and light-controlled conditions.Virulence profiles were determined based on infection types and the Pgt population showed a bimodal phenotypic distribution on H228.2c with an approximately 40:60 ratio of avirulent to virulent isolates (Fig. 1).The distinct segregation for contrasting phenotypes shows that this population is suitable for mapping avirulence/virulence loci corresponding to the Rpg1 gene in the H228.2ctransgenic line.Avirulence/virulence on cv Morex was skewed where 17% of the isolates were avirulent and 83% were virulent out of 113 isolates used which included both the PNW population and Midwestern isolates.Although H228.2c carries a single copy of Rpg1 from cv Morex in the Golden Promise background, it was shown in previous studies [15,41] and in this study (Fig. 1, Supplementary Table S1) to contain an elevated level of resistance to isolates avirulent on Rpg1 compared to cv Morex.Also, a larger proportion of the isolates were avirulent on H228.2c (43%) compared to the isolates avirulent on cv Morex (17%).The level of resistance on the avirulent isolates for H228.2c is near immunity (IT range of 0 to 2 and mode of 0;) whereas cv Morex provided a range from resistant to moderately resistant (IT range of 0; to 23-and mode of 21) to the avirulent isolates.Also, all the isolates avirulent on cv Morex showed near immunity reactions on H228.2c.We hypothesize that the resistance response mediated by the AvrRpg1-Rpg1 gene-for-gene interaction is enhanced by a genetic component in the cv Golden Promise background, or some isolates contain a suppressor of resistance that functions in the cv Morex interaction but does not function in the Golden Promise transgenic line H228.2c.Since the PNW population (n = 89) has a low percentage of isolates avirulent on Rpg1 in the cv Morex background (10%) the phenotype data previously generated for 17 Midwestern US isolates [30] were added to the panel of the 96 Pgt isolates (PNW = 89, MW = 7) to increase the power of avirulence/virulence gene mapping with cv Morex.
Genotyping technologies like GBS or RNA sequencing are cheaper due to the reduced representation but present limitations to high density genotyping.One pitfall of GBS is uneven sequencing coverage resulting in a high proportion of missing data [75].RNAseq on the other hand only detects sequence variation within expressed genes [76].Due to these constraints of GBS and RNAseq, a whole genome shotgun sequencing (WGS) approach was utilized to genotype the Pgt population.One major benefit of WGS is that it allows for the detection of variation in both gene coding and non-coding regions [77].For precise variant detection high coverage sequence data is essential [77], thus 96 Pgt isolates were sequenced to an average depth of 113 × generating sequencing reads with Q30 for 94% of the bases.This high-quality sequencing data was suitable for variant calling and downstream processing.We obtained a total of 1,195,947 SNPs and 168,516 Indels for the 96 isolates.This corresponds to 13 SNPs and 2 Indels per kilobase of the Pgt genome (~ 89 Mbp) indicating high marker density for this natural Pgt population.This marker density and population size was much more robust than a recent study where 30 Pst isolates were sequenced to 30 × depth for the purpose of association mapping [72].Of the > 1 million variants identified in this Pgt population deleterious variants included splice, start, stop lost/gained, splice, missense, frameshift, conservation inframe deletion/insertion, and disruptive inframe deletion/insertion variants that accounted for 2.9 and 2.8% of total SNPs and Indel effects, respectively.
The level of genome homozygosity/heterozygosity is often used as an indicator of the mode of reproduction in an organism [78][79][80].In diploid or binuclear fungi asexual reproduction is expected to increase genome heterozygosity due to accumulation of distinct mutations in one genome copy that are unable to recombine to form homozygotes [78].Based on the rates of SNP homozygosity of isolates collected from the primary cereal hosts barley and wheat, it can be inferred that sexual reproduction is the major means of propagation for Pgt in the PNW region.However, a comparatively higher level of heterozygosity of seven cereal isolates indicated that PNW Pgt also reproduces asexually.Apart from a few exceptions, the rate of SNP homozygosity in isolates collected from cereal (barley and wheat) and alternate hosts (Mahonia and barberry) were similar.This indicated that sexual hosts are a major source of inoculum on PNW cereals.Originally, we hypothesized that six of seven isolates collected from the Midwest US are of asexual origin so should carry highly heterozygous genomes.To our surprise, the heterozygosity rate (37-44%) for these six isolates was comparable to average heterozygosity (35%) of isolates from the alternate hosts.This indicates that Midwest isolates utilized in this study may have undergone a shorter asexual reproduction period than previously expected resulting in the accumulation of fewer mutations present in the heterozygous state.This data doesn't fit with the hypothesis that these isolates were collected from a stabilized asexual population as they were collected after sexual hosts were eradicated from the Midwestern US.The midwestern isolates utilized in this study were collected from 1977 to 1999, after barberry eradication phased out in the late 1970s.The virulence profile of Midwest isolate, QCCJB resembles sexual isolates from the PNW, so it was speculated that QCCJB probably originated from the PNW [12].The WGS data generated was utilized to provide conclusive evidence that QCCJB did originate from the PNW region and was disseminated eastward into the Midwest.The level of genome homozygosity (63%) supports the sexual origin of QCCJB.Based on hierarchical clustering (Fig. 5), QCCJB is more closely related to PNW isolates collected from the alternate host barberry while the other six Midwestern isolates formed a separate sub-group within the same cluster.Initially we hypothesized that the clustering of isolates would be based on the host from which they were derived.However, two major clusters included isolates originated from both primary cereal hosts barley and wheat and alternate sexual hosts barberry and Mahonia.Interestingly, cereal isolates were clustered based on geographic location.For example, cereal isolates from the Valley ford, WA region formed close sub-groups within the first cluster and the majority of cereal isolates from the Pullman, WA region were in second and third cluster.However, sexual isolates had a lower degree of subgrouping based on the location of collection.
Association mapping identified 53 and 2 significant SNPs associated with disease phenotype on the Golden Promise transgenic line H228.2c and cv Morex, respectively.One major pitfall of GWAS is spurious association of genotype and phenotype resulting in detection of false-positive associations [81].We tried to minimize this error by incorporating structure (Q) and relatedness (K) data as covariate in the MLM model with H228.2c and structure (Q) in the BLINK model with cv Morex.Also, to alleviate the possibility of false associations stringent thresholds of 7.3 and 6.4 LOD scores were used with H228.2c and cv Morex, respectively.Two novel QTLs were identified for Avr_Rpg1, one on supercontig2.30(designated Avr_Rpg1A) and one on supercontig2.11(designated Avr_Rpg1B).The major avirulence locus on supercontig2.30represents a common locus identified for both H228.2c and cv Morex.Haplotype analysis of significant SNPs at this locus indicated that the majority of avirulent isolates share common haplotypes in the homozygous or heterozygous state whereas the virulent isolates share the alternate putative recessive virulent haplotype in the homozygous state.This suggests that this locus harbors a dominant avirulence gene shared by the majority of avirulent isolates hence is considered the major Avr_Rpg1 locus.The Avr_Rpg1B locus identified on supercontig2.11with cv Morex was considered the minor locus because several avirulent isolates carry the same allele present among virulent isolates.This indicates that only a few avirulent isolates carry this avirulence effector or possibly represents a false positive association.
Harold Henry Flor in his landmark paper described the gene-for-gene hypothesis which explained the dominant nature of pathogen avirulence genes and their specific interactions with dominant host resistance genes on the basis of specific genetic interaction of flax rust, Melamspora lini with its host, flax (Linum usitatissimum) [82].Based on the classical gene-for-gene hypothesis, the product of the dominant avirulence gene is directly or indirectly recognized by the dominant R-gene protein, resulting in resistance or an incompatible disease reaction [83].Surprisingly, in the wheat-stripe rust pathosystem, P. striiformis f. sp.tritici (Pst) avirulence corresponding to several wheat stripe rust R-genes were determined to be recessive in contrast to the well-established dogma of dominant avirulence and recessive virulence in hostpathogen genetic interaction with biotrophic pathogens [65,66].However, here we determined Pgt avirulence to Rpg1 is dominant based on the state of haplotypes (heterozygous/homozygous) carried by avirulent isolates at the major Avr_Rpg1A locus.Interestingly, the dominant avirulence haplotype was present in the heterozygous state for the majority of the avirulent isolates.Two cloned Pgt effector genes, AvrSr35 and Avr50 were also present in heterozygous state in the mutants from which they were identified [23,24].Mutation of Avr_Rpg1 could lead to loss of recognition by Rpg1, and the evolution of Rpg1 virulence in the PNW population.This hypothesis is supported by primary amino acid substitutions, N92D and R34K in two candidate effectors, PGTG_10878 and PGTG_05433, respectively.Host selection pressure is considered crucial for the evolution of virulence on deployed resistance genes in pathogen populations [84].Thus, it is quite surprising that the PNW Pgt population assayed is dominated by isolates virulent on Rpg1 considering that the R-gene has not been deployed in commercial cultivars grown in the region.Previous studies have reported several wild grasses like Elymus glaucus, Agrostis alba, Elytrigia repens, and Elymus canadensis host a native Pgt population in the PNW [13].It is possible that these wild grasses or others carry resistance mechanisms similar to Rpg1-mediated resistance that exerted selection pressure to overcome Rpg1 or there is a similar unknown resistance mechanism in wheat or the alternate sexual hosts that are applying the selection pressure for isolates virulent on Rpg1.To begin answering these questions of Rpg1 virulence evolution and mechanisms, it is essential to identify the Avr-Rpg1 effector or virulence effectors that suppress Rpg1-mediated resistance.
The major Avr_Rpg1A locus on supercontig 2.30 harbors a total of four candidate genes (PGTG_10878, PGTG_10884, PGTG_10885, and PGTG_10886) within a 35 kb interval and the minor Avr_Rpg1B locus contains a single candidate gene (PGTG_05433) within a 78 bp interval on supercontig2.11.Repeats identified within the major Avr_Rpg1A locus on supercontig 2.30 were mostly small (< 20 bp) and represented a small portion (11%) of the region (Fig. 7).Also, the gene density in this region was comparable to rest of supecon-tig2.30.In several fungi including Fusarium oxysporum f sp.lycopersici, Leptosphaeria maculans, and Zymoseptoria tritici effector genes were reported in gene poor and repeat rich regions of the genome [73].However, genomes of rust fungi including Pgt and Pst do not follow the two-speed genome model [50,85].Among the candidate genes, PGTG_10878 encodes a small protein (< 300 aa), PGTG_10884 and PGTG_10885 encode medium sized proteins (300-500 aa) typical of known secreted effectors whereas PGTG_10886 and PGTG_05433 encode larger proteins (> 500 aa).Several cloned Pgt effector genes including AvrSr35 (578 aa) [23], RGD (818 aa) [21], and VPS9 (744 aa) [21] encode larger proteins while AvrSr50 (132 aa) [24] and AvrSr27 (144 aa) [25] encode small proteins.Hence, protein size alone cannot be used to prioritize candidate effector genes in Pgt.Three candidate genes were unique to Pgt suggesting that they either evolved de novo or were acquired horizontally.Two genes were probably acquired from ancestral rust during speciation since they shared sequence similarity with other rust species.It is important to note that all effector genes are not species specific, and some are shared by several fungal species.For example, Verticillium dahliae avirulence effector, Ave1 that interacts with Ve1 in tomato has homology or orthologs in many fungal species and bacteria [86].
Effectors are pathogen molecules that evolve virulence function to successfully colonize their host by suppressing plant defense response [87] or evolve to aid in nutrient acquisition [88].These effector molecules when detected by host R-gene receptors, cytoplasmic or intracellular, become avirulence effectors as they elicit the plant resistance responses [89].However, avirulence effectors can escape R-gene mediated recognition through molecular arms race evolutionary processes as depicted by the zig-zag model proposed by [90].Avirulence genes can be disrupted through mechanisms including insertion, deletion, and nucleotide substitutions resulting in differential expression, amino acid sequence alteration and protein modification [73].Here, we show that nucleotide polymorphisms within the primary coding sequences or within regulatory regions of the candidate Avr_Rpg1A and Avr_Rpg1B avirulence genes delimited by our GWAS analysis potentially leads to loss of avirulent effector recognition by the cognate Rpg1 receptor.Since we identified candidate effector genes based on the result of genotype and phenotype association, the top candidate gene should be the one with a high degree of correlation between allele type and observed phenotypes.Based on this criterion, the gene model PGTG_10886 is the top candidate Avr_Rpg1A gene.Surprisingly, polymorphism detected on PGTG_10886 between virulent and avirulent isolates were in introns, near splice sites and 3'-regulatory region, except a SNP in exon 13 that caused a synonymous mutation (Fig. 8).Hence, functional studies will be crucial to validate our candidate genes and elucidate the role of the causal mutations in phenotypic diversity.Also, it is important to note that Pgt gene architectures are based on predictions with computer algorithms, and thus misannotation can result in incorrect predictions of causal mutations [50].Another candidate gene, PGTG_10878 identified within the AvrRpg1A locus, contains a non-synonymous mutation (N92D) which is highly correlated (~ 90%) with disease phenotype of virulent and avirulent isolates (Supplementary Table S3).Furthermore, the gene is unique to Pgt and encodes a small protein.Hence, PGTG_10878 is also a strong Avr_Rpg1A candidate gene.The candidate gene, PGTG_05433 at the minor Avr_Rpg1B locus also harbors a single nucleotide variant in its coding sequence that is predicted to cause the non-synonymous mutation, arginine to lysine at amino acid position 34.However, genotype by phenotype correlation was not strong; therefore, the Avr_Rpg1B effector is probably shared by a low proportion of avirulent isolates or this locus possibly represents a false association.Thus, two strong Avr_Rpg1A candidate genes, PGTG_10886 and PGTG_10978 will be prioritized for initial functional validation work.

Conclusions
It is crucial to understand the evolutionary mechanism of virulence on important R-genes which is especially important in the barley-Pgt pathosystem where the primary germplasm pool only contains three characterized effective stem rust resistance genes, Rpg1, RMRL, and Rpg7.Thus, to understand effective deployment of resistances the identification of avirulence loci/genes is the first step towards unravelling the evolutionary forces that enable effectors to evade or suppress R-gene mediated defense signaling.We utilized ~ 1.2 million SNPs to map avirulence loci in Pgt via a GWAS approach.A total of 55 MTAs were identified corresponding to two unique loci associated with AvrRpg1 that interact with the barley Rpg1 gene.The major Avr_Rpg1A locus harbored four candidate effector genes and the minor Avr_Rpg1B locus harbored one candidate gene.The dominant nature of the Avr_Rpg1A locus agrees with Flor's classical gene-forgene model.Several markers identified in this study perfectly correlated with pathogen phenotype, hence, they can be utilized to screen Pgt population for Rpg1 virulence.The candidate Avr_Rpg1A and Avr_Rpg1B genes identified in this study will be utilized for future functional validation work.Thus, this study begins to fill gaps in our current understanding of host-pathogen interaction in the barley-Pgt pathosystem.

Fig. 1
Fig. 1 Disease phenotype distribution of the Pgt isolates used in this study.Black dot in each violin indicates mean disease score.Disease scores are presented on a 0-5 scale.The Pgt population showed a clear unimodal disease distribution on susceptible lines, Steptoe, Harrington and Golden Promise (GP).A bimodal disease distribution was observed on Golden promise transgenic line (H228.2c),carrying the cv Morex source of the Rpg1 gene.A skewed distribution was observed on Morex, carrying the natural source of the Rpg1 gene.Barley lines, Golden Promise and Golden Promise Transgenic are abbreviated as GP and GPT, respectively in the violin plot

Fig. 3
Fig. 3 Genomewide distribution of variants (SNPs/Indels) identified among Pgt isolates (n = 96).Red bars on the outer layer indicate the length of supercontigs (n = 392 supercontigs) for the Pgt reference assembly, CRL-75-36-700-3.Each axis on this layer represents 300 kb.Blue bars show gene densities on each supercontig.Each axis represents 1 gene/10 kb.Green and purple bars indicate SNP and Indel densities, respectively.Each axis represents 4 SNPs or 1 Indel per kb.Orange bars display the densities of deleterious SNPs.Each axis represents 1 SNP per kb.Grey bars indicate the densities of deleterious Indels.Each axis represents 1 Indel per 10 kb.Deleterious refers to high and moderate impact variants based on the SnpEff tool

Fig. 4
Fig. 4 Bar charts depicting the proportion of homozygous and heterozygous SNP variants for Pgt isolates derived from A alternate hosts (n = 50) and B cereal hosts (n = 46).The gold and blue portion of each bar represents the proportion of homozygous and heterozygous SNP variants, respectively.Dotted lines indicate average homozygosity (0.63) for the Pgt population

Fig. 6
Fig. 6 Manhattan plots depicting association analyses results of Puccinia graminis f. sp.tritici (Pgt) isolate phenotypes with two barley lines carrying the Rpg1 gene; A Golden Promise transgenic line (H228.2c),and B Morex.X-axis indicates Pgt supercontigs.Y-axis represents LOD [-log10(p)] scores for SNP markers.Blue dotted line shows significant threshold of 7.3 LOD value with H228.2c and 6.4 with Morex.AvrRpg1A and AvrRpg1B are the two avirulence loci corresponding to the barley R-gene, Rpg1.AvrRpg1A was identified with both Rpg1_H228.2cand Rpg1_Morex.AvrRpg1B was identified with Rpg1_Morex only

Fig. 7
Fig. 7 Genome architecture of 35 kb AvrRpg1A locus identified with Golden Promise transgenic line (H228.2c).The outer layer shows physical position (kb) on supercontig2.30.Orange tiles indicate the gene space of four candidate genes in the region.Predicted genes are labelled right above tiles along with the orientations (F = forward and R = reverse).Red and blue tiles represent smaller repeats (< 300 bp) and larger repeats (> 300 bp), respectively.Purple bars indicate total SNPs per 200 bp interval.Each axis represents 2 SNPs.Green bars depict total Indels per 200 bp.Two axes depict 1 Indel

Fig. 8 Fig. 9
Fig.8SNP variants in candidate effector genes underlying the AvrRpg1A and AvrRpg1B loci.For each candidate gene, exons are represented by blue boxes, intron/s by thin black lines, and putative 5' and 3' regions (500 bp upstream/downstream) by thick black lines.Red arrows represent SNPs.Red stars show non-synonymous mutations.Four effector gene candidates, PGTG_10878, PGTG_10884, PGTG_10885, and PGTG_10886 correspond to the AvrRpg1A locus and were identified with Rpg1_H228.2cdisease reactions.For PGTG_10878, SNPs are in exon 1 and 3' region.The single exonic SNP is non-synonymous mutation, N92D.For PGTG_10884, SNPs are in the putative promoter region.For PGTG_10885, SNPs are in intron 5 and 3' region.For PGTG_10886, SNPs are in intron 10, intron 11, intron 12, exon 13, and 3' region.The single exonic SNP is a synonymous mutation.Two effector gene candidates, PGTG_10886 (AvrRpg1A) and PGTG_05433 (AvrRpg1B) were identified with the Rpg1_Morex disease reactions.For PGTG_10886, a single SNP is in 3' region.For PGTG_05433, the single exonic SNP results in the predicted non-synonymous mutation, R34K

Table 1
Source, improvement status, and stem rust R-gene present in the barley genotypes used in this study

Table 3
Significant markers trait associations identified with Rpg1_transgenic line H228.2c